-
Notifications
You must be signed in to change notification settings - Fork 28
INTPYTHON-751 Make query generation omit $expr unless required #396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
529e0ff
to
a78f26b
Compare
8b0c247
to
141f1cf
Compare
d11378a
to
2c48d11
Compare
Substr.as_mql = substr | ||
Trim.as_mql = trim("trim") | ||
TruncBase.as_mql = trunc | ||
Cast.as_mql_expr = cast |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the function does not support as_mql_path
. It could be added latter if we try to simplify constants expressions
return value | ||
|
||
|
||
def base_expression(self, compiler, connection, as_path=False, **extra): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the common handler for all expressions. It defines if an expr
is needed or not.
KeyTransformExact.as_mql_expr = key_transform_exact_expr | ||
KeyTransformExact.as_mql_path = key_transform_exact_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check alphabetization of the classes and functions (not only in this file).
django_mongodb_backend/base.py
Outdated
} | ||
|
||
def range_match(a, b): | ||
## TODO: MAKE A TEST TO TEST WHEN BOTH ENDS ARE NONE. WHAT SHALL I RETURN? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AI says, "If either the start or end value provided to the BETWEEN operator is NULL, the entire BETWEEN condition will typically evaluate to UNKNOWN (and thus FALSE in a WHERE clause), unless explicitly handled." (I confirmed this for SQLite and PostgreSQL)
However, to match the semantics implemented here where None is treated as min/max date, I would expect __range=[None, None]
not to filter any values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raise FullResultSet is the way to go, and EmptyResultSet if the low
is bigger than high
.
Aggregate.as_mql_expr = aggregate | ||
Count.as_mql_expr = count | ||
StdDev.as_mql_expr = stddev_variance | ||
Variance.as_mql_expr = stddev_variance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see we have as_mql_expr()
, as_mql_path()
, and as_mql(..., as_path=...)
. If this is the way we keep it, it would be good to explain in the design document which objects (aggregate, func, expression, etc.) get which.
I wonder about renaming as_mql_expr()
or as_mql_path()
to as_mql()
(i.e. treating one of paths as the default). Do you think it would be more or less confusing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that was the idea. I’ll explain it in the docs, and we might also consider renaming some methods. The core concept is:
- Every expression has an
as_mql
method. - In some cases, it’s simpler to implement
as_mql
directly, so those methods don’t follow the common expression flow. - For other expressions,
as_mql
is a composite function that delegates toas_path
oras_expr
when applied. - The
base_expression.as_mql
method controls when these are called and performs boilerplate checks to prevent nesting anexpr
inside anotherexpr
(a MongoDB 6 restriction).
In short: every object has as_mql
. Some also define as_path
and as_expr
. The base_expression
coordinates how these methods are used, except for cases where as_mql
is defined directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc here: link
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timgraham I actually like the decoupling of as_mql
and as_mql_(path | expr)
. I view it as, you need to define at least two:
as_mql
and as_mql_expr
.
Then if you want to have the more optimized function you define as_mql_path
. It feels less confusing to me that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 An expression needs at least one of the methods, as_mql_expr is enough (not optimal but functional). The aggregation module is a good example. Because the as_mql is defined in baseExpression class. In this approach most of the expressions don't need as_mql (if as_mql is defined, expr or path aren't needed).
EDIT: Sorry, I rushed the answer before read all the text, you got the point well. 😬
5827580
to
5b9fa93
Compare
conditions.append({a: {"$lte": b[1]}}) | ||
if start is not None and end is not None: | ||
if isinstance(start, Decimal128): | ||
start = start.to_decimal() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
potencial overflow risk here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a failing test without the to_decimal()
calls? I'm not sure why there would be a mix of Decimal/Decimal128 here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a BSON type held at the database and returned locally. I think it should be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, there is: model_fields_.test_polymorphic_embedded_model.QueryingTests.test_range
. Maybe this case shouldn't be handled here, and let the query optimizer to remove those False
cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left my first round of comments. I'm going to review the tests as a second phase of the PR.
django_mongodb_backend/base.py
Outdated
if b: | ||
return {"$or": [{a: {"$exists": False}}, {a: None}]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does b represent in this case? Not negating?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 well, it is a bit confusing here. but this function checks for nullability. So, if b is True then we check null. maybe the parameter should be: field, null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
field and is_null are good clarifiers
Aggregate.as_mql_expr = aggregate | ||
Count.as_mql_expr = count | ||
StdDev.as_mql_expr = stddev_variance | ||
Variance.as_mql_expr = stddev_variance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timgraham I actually like the decoupling of as_mql
and as_mql_(path | expr)
. I view it as, you need to define at least two:
as_mql
and as_mql_expr
.
Then if you want to have the more optimized function you define as_mql_path
. It feels less confusing to me that way.
conditions.append({a: {"$lte": b[1]}}) | ||
if start is not None and end is not None: | ||
if isinstance(start, Decimal128): | ||
start = start.to_decimal() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a BSON type held at the database and returned locally. I think it should be good.
def as_mql_expr(self, compiler, connection): | ||
lhs_mql = process_lhs(self, compiler, connection, as_path=False) | ||
value = process_rhs(self, compiler, connection, as_path=False) | ||
return {"$gte": [lhs_mql, value]} | ||
|
||
def as_mql_path(self, compiler, connection): | ||
lhs_mql = process_lhs(self, compiler, connection, as_path=True) | ||
value = process_rhs(self, compiler, connection, as_path=True) | ||
return {lhs_mql: {"$gte": value}} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there a $gte
query in a search.text
lookup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To convert a score function into a filter I decided to express the following proposition: score_func(...) > 0
.
return self.is_simple_column | ||
|
||
@cached_property | ||
def is_simple_column(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this validation for embedded models used in multiple places. Can you consolidate this to the query_utils file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 will try. The overall structure of the function is similar, but the type varies. I could create a meta-function that generates the appropriate function for a given type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get better. 😬 . I will let as it is. there is some details like:
previous._field.column
, previous.key_name
when extract the field_name or the path. Then are different from EMFA and EMF the return, one has to validate the inner transform while the other doesn't
django_mongodb_backend/compiler.py
Outdated
where = self.get_where() | ||
try: | ||
expr = where.as_mql(self, self.connection) if where else {} | ||
match = where.as_mql(self, self.connection, as_path=True) if where else {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we enforce as_path=True
here, can we set that as the default on WhereNode.as_mql(..., as_path=True)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will refactor. Also it may change the name, as_expr=False. but it is scenically the same
|
||
|
||
def process_lhs(node, compiler, connection): | ||
def process_lhs(node, compiler, connection, as_path=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the delegation logic, should as_path
always default to True
?
If the caller requests a path filter (as_path=True) and the expression supports it, as_mql_path is executed.
If the expression cannot be represented as a path filter but as_path=True, it is automatically encapsulated in $expr.
If the caller requests an expression (as_path=False), as_mql_expr is executed directly. No additional encapsulation is needed, since the query generation ensures proper nesting of $expr.
if isinstance(value, CombinedExpression): | ||
# Temporary: treat all CombinedExpressions as non-constant until | ||
# constant cases are handled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make a follow-up ticket for this? If it already exists, could you link it in the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
|
||
def valid_path_key_name(key_name): | ||
return bool(re.fullmatch(r"[A-Za-z0-9_]+", key_name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://www.mongodb.com/docs/manual/core/dot-dollar-considerations/
Values like hashtags are also valid for path names and don't require $expr
. To my knowledge so long as it's not (.) or ($) it's good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, will adjust the expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if there is some emoji or some non ascii character? 🤔
def as_mql_expr(self, compiler, connection): | ||
columns, parent_field = self._get_target_path() | ||
mql = parent_field.as_mql(compiler, connection) | ||
for key in columns: | ||
mql = {"$getField": {"input": mql, "field": key}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
potentially out of scope:
https://github.com/mongodb/django-mongodb-backend/pull/392/files#diff-0a6ce30a131a00fa88086c4c4d0d6e6232845fd11ef2bc67891fdf92e10c3743R18-R45
Is it possible to still remove $getField
in as_mql_expr
or is it expected that routing to as_mql_expr
for embedded model queries is because of needing a getField
call?
@property | ||
def can_use_path(self): | ||
simple_column = getattr(self.lhs, "is_simple_column", False) | ||
constant_value = is_constant_value(self.rhs) | ||
return simple_column and constant_value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
django_mongodb_backend/base.py
Outdated
if b: | ||
return {"$or": [{a: {"$exists": False}}, {a: None}]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
field and is_null are good clarifiers
self.assertAggregateQuery( | ||
query, | ||
"model_fields__nullableintegerarraymodel", | ||
[{"$match": {"field": {"$in": ([1], [2])}}}], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does $in now expect a tuple?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but it wasn’t a new convention or expectation. There was already a test that checks the RHS $in
as a tuple, so I just followed that convention. It doesn’t affect the query behavior.
"$match": { | ||
"$expr": { | ||
"$eq": [ | ||
{"$getField": {"input": "$data", "field": "integer_"}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an example of a value that could get rid of the getField.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we could get rid of those getField if we use $data.integer_
but I thought it was out of scope for this refactor. This behavior is the current behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's out of scope!
[ | ||
{ | ||
"$match": { | ||
"$expr": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious to the callback chain on this one since the null-check could actually be converted.
May best be an improvement added later
Doc here
In this PR a unified approach for generating MQL from Django expressions was implemented. The core idea is to centralize the control flow in a
base_expression
method, which decides whether the expression can be translated into a directfield: value
match (index-friendly) or must fall back to$expr
. This keeps the logic for wrapping and dispatching in one place, while each lookup/function only defines its own expression-building logic.This approach also allows mixing direct
field: value
matches with$expr
clauses within the same$match
. As a result, multiple$expr
entries may coexist alongside index-optimized conditions, depending on the shape of the query.Most lookups now follow this pattern by simply implementing
as_mql_expr
(and optionallyas_mql_path
when a match-based translation is possible). Only a few special cases likeCol
,Func
operators (except theKeyTransform
) , and many more, override the base behavior directly. This structure also leaves room for future optimizations (e.g. constant folding) without having to change the overall flow.Additionally, since MongoDB 6 does not allow nesting
$expr
inside another$expr
, the flow inbase_expression
ensures that such cases are flattened. In practice, expressions are generated without redundant wrapping, so the final MQL never contains$expr
within$expr
.NOTE: Some polish will be made, but the main idea and the majority of the code is already rendered.